91 research outputs found

    Modeling the Performance of MapReduce Applications for the Cloud

    Get PDF
    In the last years, Cloud Computing has become a key technology that made possible to run applications without needing to deploy a physical infrastructure. The challenge with deploying distributed applications in Cloud Computing environments is that the virtual machine infrastructure should be planned in a time and cost-effective way.This work is a summary of a previous work presented by the authors as a Master’s thesis, with the goal of showing that the execution time of a distributed MapReduce application, running in a Cloud computing environment, can be predicted using a mathematical model based on theoretical specifications. This prediction is made to help the users of the Cloud Computing environment to plan their deployments, i.e., quantify the number of virtual machines and its characteristics. After measuring the application execution time and varying parameters stated in the mathematical model, and after that, using a linear regression technique, the goal is achieved finding a model of the execution time which was then applied to predict the execution time of MapReduce applications. Experiments were conducted in several configurations and showed a clear relation with the theoretical model, revealing that the model is in fact able to predict the execution time of MapReduce applications. The developed model is generic, meaning that it uses theoretical abstractions for the computing capacity of the environment and the computing cost of the MapReduce application.  En los últimos años, Cloud Computing se ha convertido en una tecnología clave que ha hecho posible ejecutar aplicaciones sin la necesidad de utilizar una infraestructura física. El desafío de implementar aplicaciones distribuidas en ambientes de Cloud Computing es que la infraestructura de máquinas virtuales debe considerar aspectos relacionados con el costo y el tiempo de utilización. Este trabajo es el resumen de uno anterior, presentado por los autores como tesis de maestría, con el objetivo de demostrar que el tiempo de ejecución de una aplicación distribuida MapReduce, ejecutándose en un ambiente de Cloud Computing, puede ser predicho utilizando un modelo matemático basado en especificaciones teóricas. Esta predicción se realiza para ayudar a los usuarios de un ambiente de Cloud Computing a planificar sus implementaciones, es decir, cuantificar el número de máquinas virtuales y sus características.Después de medir el tiempo de ejecución de las aplicaciones y variando los parámetros establecidos por el modelo matemático, y seguidamente usando una técnica de regresión lineal, el objetivo se alcanza al encontrar un modelo del tiempo de ejecución que fue posteriormente aplicado para aplicaciones MapReduce. Los experimentos fueron realizados en diferentes configuraciones y mostraron una clara relación con el modelo teórico, mostrando así que el modelo es capaz de predecir el tiempo de ejecución de aplicaciones MapReduce. El modelo desarrollado es genérico, es decir que usa abstracciones teóricas para la capacidad de cómputo del ambiente y el costo computacional de la aplicacin MapReduce

    BIGhybrid: A Simulator for MapReduce Applications in Hybrid Distributed Infrastructures Validated with the Grid5000 Experimental Platform

    Get PDF
    International audienceSUMMARY Cloud computing has increasingly been used as a platform for running large business and data processing applications. Conversely, Desktop Grids have been successfully employed in a wide range of projects, because they are able to take advantage of a large number of resources provided free of charge by volunteers. A hybrid infrastructure created from the combination of Cloud and Desktop Grids infrastructures can provide a low-cost and scalable solution for Big Data analysis. Although frameworks like MapReduce have been designed to exploit commodity hardware, their ability to take advantage of a hybrid infrastructure poses significant challenges due to their large resource heterogeneity and high churn rate. In this paper is proposed BIGhybrid, a simulator for two existing classes of MapReduce runtime environments: BitDew-MapReduce designed for Desktop Grids and BlobSeer-Hadoop designed for Cloud computing, where the goal is to carry out accurate simulations of MapReduce executions in a hybrid infrastructure composed of Cloud computing and Desktop Grid resources. This work describes the principles of the simulator and describes the validation of BigHybrid with the Grid5000 experimental platform. Owing to BigHybrid, developers can investigate and evaluate new algorithms to enable MapReduce to be executed in hybrid infrastructures. This includes topics such as resource allocation and data splitting. Concurrency and Computation: Practice and Experienc

    BIGhybrid - A Toolkit for Simulating MapReduce on Hybrid Infrastructures

    Get PDF
    Cloud computing has increasingly been used as a platform for running large business and data processing applications. Although clouds have become highly popular, when it comes to data processing, the cost of usage is not negligible. Conversely, Desktop Grids, have been used by a plethora of projects, taking advantage of the high number of resources provided for free by volunteers. Merging cloud computing and desktop grids into hybrid infrastructure can provide a feasible low-cost solution for big data analysis. Although frameworks like MapReduce have been conceived to exploit commodity hardware, their use on hybrid infrastructure poses some challenges due to large resource heterogeneity and high churn rate. This study introduces BIGhybrid a toolkit to simulate MapReduce on hybrid environments. The main goal is to provide a framework for developers and system designers to address the issues of hybrid MapReduce. In this paper, we describe the framework which simulates the assembly of two existing middleware: BitDew- MapReduce for Desktop Grids and Hadoop-BlobSeer for Cloud Computing. Experimental results included in this work demonstrate the feasibility of our approach

    Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

    Get PDF
    A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.N/

    Impact: an Unreliable Failure Detector Based on Processes' Relevance and the Confidence Degree in the System

    Get PDF
    This technical report presents a new unreliable failure detector, called the Impact failure detector (FD) that, contrarily to the majority of traditional FDs, outputs a trust level value which expresses the degree of confidence in the system. An impact factor is assigned to each node and the trust level is equal to the sum of the impact factors of the nodes not suspected of failure. Moreover, a threshold parameter defines a lower bound value for the trust level, over which the confidence in the system is ensured. In particular, we defined a flexibility property that denotes the capacity of the Impact FD to tolerate a certain margin of failures or false suspicions, i.e., its capacity of considering different sets of responses that lead the system to trusted states. The Impact FD is suitable for systems that present node redundancy, heterogeneity of nodes, clustering feature, and allow a margin of failures which does not degrade the confidence in the system. The technical report also includes a timer-based distributed algorithm which implements a Impact FD, as well as its proof of correctness, for systems whose links are lossy asynchronous or for those whose all (or some) links are eventually timely. Performance evaluation results based on real PlanetLab traces confirm the degree of flexible applicability of our failure detector and, due to the accepted margin of failure, the both failures and false suspicions are more tolerated when compared to traditional unreliable failure detectors. We also show the equivalence of some classes of Impact FD in regard with Sigma and Omega classes, which are fundamental classes to circumvent the impossibility of consensus in asynchronous message-passing distributed systems

    MultiS: A Context-Server for Pervasive Computing

    Get PDF
    AbstractContext-aware applications are capable of recognizing environmental changes and adapting their behavior to the new context. This process can be divided into three stages: monitoring, context recognition and adaptation. On the monitoring layer, raw information about the environment is collected from sensors. The context recognition layer processes the data acquired from the context and transforms it into information which can be useful for the adaptation process. With this information, the adaptation system can determine what behavior is correct for the application in each different context. This paper proposes a context server called MultiS, which has the goal of solving the problems arising from the context recognition layer, and which includes the following advantages: a) the production of new context data based on the information of several sensors and an ability to react to changes in the environment; b) definition of a composed language for the context data called CD-XML; c) support for mobility

    Impact FD: An Unreliable Failure Detector Based on Process Relevance and Confidence in the System

    Get PDF
    International audienceThis paper presents a new unreliable failure detector, called the Impact failure detector (FD) that, contrarily to the majority of traditional FDs, outputs a trust level value which expresses the degree of confidence in the system. An impact factor is assigned to each process and the trust level is equal to the sum of the impact factors of the processes not suspected of failure. Moreover, a threshold parameter defines a lower bound value for the trust level, over which the confidence in the system is ensured. In particular, we defined a f l exi bi l i t y property that denotes the capacity of the Impact FD to tolerate a certain margin of failures or false suspicions, i.e., its capacity of considering different sets of responses that lead the system to trusted states. The Impact FD is suitable for systems that present node redundancy, heterogeneity of nodes, clustering feature, and allow a margin of failures which does not degrade the confidence in the system. The paper also includes a timer-based distributed algorithm which implements an Impact FD, as well as its proof of correctness, for systems whose links are lossy asynchronous or for those whose all (or some) links are eventually timely. Performance evaluation results, based on PlanetLab [1] traces, confirm the degree of flexible applicability of our failure detector and that, due to the accepted margin of failure, both failures and false suspicions are more tolerated when compared to traditional unreliable failure detectors
    corecore